How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, and paraphrasing. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 18.9% on the GLUE benchmark.
translated by 谷歌翻译
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
translated by 谷歌翻译
We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
translated by 谷歌翻译
Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.
translated by 谷歌翻译
As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.
translated by 谷歌翻译
关于文本到SQL语义解析的最新研究取决于解析器本身或基于简单的启发式方法来理解自然语言查询(NLQ)。合成SQL查询时,没有可用的NLQ的明确语义信息,从而导致不良的概括性能。此外,如果没有词汇级的细粒度查询理解,查询与数据库之间的链接只能依赖模糊的字符串匹配,这会导致实际应用中的次优性能。考虑到这一点,在本文中,我们提出了一个基于令牌级的细粒度查询理解的通用,模块化的神经语义解析框架。我们的框架由三个模块组成:命名实体识别器(NER),神经实体接头(NEL)和神经语义解析器(NSP)。通过共同建模查询和数据库,NER模型可以分析用户意图并确定查询中的实体。 NEL模型将类型的实体链接到数据库中的模式和单元格值。解析器模型利用可用的语义信息并链接结果并根据动态生成的语法合成树结构的SQL查询。新发布的语义解析数据集的Squall实验表明,我们可以在WikiableQuestions(WTQ)测试集上实现56.8%的执行精度,这使最先进的模型的表现优于2.7%。
translated by 谷歌翻译
我们将点隶属关系引入特征Upsmpling,这一概念描述了每个上采样点的隶属关系到具有语义相似性的本地解码器特征点形成的语义群集。通过重新思考点的隶属关系,我们提出了一种通用公式,用于产生上采样内核。内核不仅鼓励语义平滑度,还鼓励上采样的特征图中的边界清晰度。此类属性对于某些密集的预测任务(例如语义分割)特别有用。我们公式的关键思想是通过比较每个编码器特征点与解码器特征的空间相关局部区域之间的相似性来生成相似性感知的内核。通过这种方式,编码器特征点可以作为提示,以告知UPS采样特征点的语义集群。为了体现该配方,我们进一步实例化了轻巧的增加采样算子,称为相似性 - 吸引点隶属关系(SAPA),并研究其变体。 SAPA会在许多密集的预测任务上邀请一致的性能改进,包括语义分割,对象检测,深度估计和图像垫。代码可用:https://github.com/poppinace/sapa
translated by 谷歌翻译
跳过连接是编码器网络中的基本单元,能够改善神经网络的特征宣传。但是,大多数带有跳过连接的方法仅连接了编码器和解码器中相同分辨率的连接功能,这忽略了编码器中的信息损失,而图层的进度更深。为了利用编码器较浅层中特征的信息损失,我们提出了一个完整的跳过连接网络(FSCN),以实现单眼深度估计任务。此外,要更接近跳过连接中的功能,我们提出了一个自适应串联模块(ACM)。此外,我们对FSCN和FSCN的室内和室内数据集(即Kitti Dataste和NYU DEPTH DATASET)进行了广泛的实验。
translated by 谷歌翻译
神经辐射场(NERF)及其变体在代表3D场景和合成照片现实的小说视角方面取得了巨大成功。但是,它们通常基于针孔摄像头模型,并假设全焦点输入。这限制了它们的适用性,因为从现实世界中捕获的图像通常具有有限的场地(DOF)。为了减轻此问题,我们介绍了DOF-NERF,这是一种新型的神经渲染方法,可以处理浅的DOF输入并可以模拟DOF效应。特别是,它扩展了NERF,以模拟按照几何光学的原理模拟镜头的光圈。这样的物理保证允许DOF-NERF使用不同的焦点配置操作视图。 DOF-NERF受益于显式光圈建模,还可以通过调整虚拟光圈和焦点参数来直接操纵DOF效果。它是插件,可以插入基于NERF的框架中。关于合成和现实世界数据集的实验表明,DOF-NERF不仅在全焦点设置中与NERF相当,而且可以合成以浅DOF输入为条件的全焦点新型视图。还展示了DOF-nerf在DOF渲染上的有趣应用。源代码将在https://github.com/zijinwuzijin/dof-nerf上提供。
translated by 谷歌翻译
生成的对抗网络(GAN)已受过培训,成为能够创作出令人惊叹的艺术品(例如面部生成和图像样式转移)的专业艺术家。在本文中,我们专注于现实的业务方案:具有所需的移动应用程序和主题样式的可自定义图标的自动生成。我们首先引入一个主题应用图标数据集,称为Appicon,每个图标都有两个正交主题和应用标签。通过研究强大的基线样式,我们观察到由正交标签的纠缠引起的模式崩溃。为了解决这一挑战,我们提出了由有条件的发电机和双重歧视器组成的ICONGAN,具有正交扩大,并且进一步设计了对比的特征分离策略,以使两个歧视器的特征空间正常。与其他方法相比,ICONGAN在Appicon基准测试中表明了优势。进一步的分析还证明了解开应用程序和主题表示的有效性。我们的项目将在以下网址发布:https://github.com/architect-road/icongan。
translated by 谷歌翻译